on the combinatorics of rna secondary structures in a
play

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta - PowerPoint PPT Presentation

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on joint work with Emma Yu Jin CanaDAM 2013 Newfoundland, Canada Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 1 / 17 Plan of Talk RNA


  1. On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on joint work with Emma Yu Jin CanaDAM 2013 Newfoundland, Canada Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 1 / 17

  2. Plan of Talk RNA Secondary Structure 1 basic definitions enumeration polymer-zeta model (motivation and definition) Enumeration in the Polymer-Zeta Model 2 fundamentals average number of hairpins Overview of Results and Discussion 3 Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 2 / 17

  3. RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

  4. RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . which may be part of at most one edge connecting nodes of 2 distance (in the chain) at least 2 (counted by hops). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

  5. RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . which may be part of at most one edge connecting nodes of 2 distance (in the chain) at least 2 (counted by hops). Secondary structure: Edges (arcs) are not allowed to cross . Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

  6. RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . which may be part of at most one edge connecting nodes of 2 distance (in the chain) at least 2 (counted by hops). Minimal distance: Edge connecting orange nodes allowed. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

  7. Enumeration Enumerating secondary structures is easy; their number is given by the following recurrence relation: � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) . 0 � k � n − 2 If we want to take sequence information into account, we can work with � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) η ( k + 1 , n + 1 ) (1) 0 � k � n − 2 where η ( i , j ) is the indicator which is 1 iff s i and s j are complementary. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 4 / 17

  8. Enumeration Enumerating secondary structures is easy; their number is given by the following recurrence relation: � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) . 0 � k � n − 2 If we want to take sequence information into account, we can work with � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) η ( k + 1 , n + 1 ) (1) 0 � k � n − 2 where η ( i , j ) is the indicator which is 1 iff s i and s j are complementary. Random sequence: Taking expectation of eq. (1); η ( i , j ) � so-called stickiness p (the expectation of η ) corresponding to the probability for two random nucleotides to be complementary. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 4 / 17

  9. Enumeration Enumerating secondary structures is easy; their number is given by the following recurrence relation: � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) . 0 � k � n − 2 If we want to take sequence information into account, we can work with � e ( n + 1 ) = e ( n ) + e ( k ) e ( n − k − 1 ) × p (1) 0 � k � n − 2 where η ( i , j ) is the indicator which is 1 iff s i and s j are complementary. Random sequence: Taking expectation of eq. (1); η ( i , j ) � so-called stickiness p (the expectation of η ) corresponding to the probability for two random nucleotides to be complementary. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 4 / 17

  10. Algorithmic challenge Input: RNA sequence (cheap with today’s lab techniques). Output: (Predicted) RNA secondary structure (considered a good approximation of 3D conformation). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 5 / 17

  11. Algorithmic challenge Input: RNA sequence (cheap with today’s lab techniques). Output: (Predicted) RNA secondary structure (considered a good approximation of 3D conformation). Prominent approach: Dynamic programming, i.e. table filling algorithm: Processing input sequence s 1 s 2 · · · s n , 1 V ( i , j ) represents the minimal energy possible for a folding of 2 subsequence s i · · · s j subject to the i -th and j -th nucleotide being paired to each other; W ( i , j ) gives the corresponding minimum without that restriction. 3 � n 3 runtime algorithms (quadratic number of entries each giving rise to linear time). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 5 / 17

  12. Motivation for Polymer-Zeta Model Observation: While computing optimal folding for subsequence s i · · · s j , a pairing of s i and s k only needs to be considered if pairing of s i and s k already implied a minimum while considering s i · · · s j ′ , j ′ < j . Speedup: Bookkeeping (candidate list) of s k observed in minimal pairings for smaller subsequences may reduce the number of combinations to be considered for each entry. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 6 / 17

  13. Motivation for Polymer-Zeta Model Observation: While computing optimal folding for subsequence s i · · · s j , a pairing of s i and s k only needs to be considered if pairing of s i and s k already implied a minimum while considering s i · · · s j ′ , j ′ < j . Speedup: Bookkeeping (candidate list) of s k observed in minimal pairings for smaller subsequences may reduce the number of combinations to be considered for each entry. Polymer-zeta property: probability for the i -th and j -th nucleotides at b distance d = j − i + 1 to form a pair is given by p d = d c (for some constants b > 0 , c > 0 ). � candidate list of (expected) constant length and thus expected quadratic run time algorithm. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 6 / 17

  14. Question addressed here For certain classes of RNA (especially mRNA) it is justified to assume the polymer-zeta property. Question: Is it appropriate in general? Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 7 / 17

  15. Question addressed here For certain classes of RNA (especially mRNA) it is justified to assume the polymer-zeta property. Question: Is it appropriate in general? Approach: We compute the average shape of secondary structures (considered a combinatorial object thus no nucleotides, just size) assuming the polymer-zeta property using methods from enumerative combinatorics and compare it to statistics derived from native foldings (databases). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 7 / 17

  16. Enumeration in the Polymer-Zeta Model Model: Study r ( n + 1 ) = r ( n ) + � 0 � k � n − 2 r ( k ) r ( n − k − 1 ) × p n − k which – in analogy to Bernoulli model – is the expected number of structures of size n denoted E c , b # ( S n ) . Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 8 / 17

  17. Enumeration in the Polymer-Zeta Model Model: Study r ( n + 1 ) = r ( n ) + � 0 � k � n − 2 r ( k ) r ( n − k − 1 ) × p n − k which – in analogy to Bernoulli model – is the expected number of structures of size n denoted E c , b # ( S n ) . If we additionally compute the expected number of structures with parameter value k (e.g. number of so-called hairpins) E c , b # ( S n , k ) , then E c , b # ( S n , k ) � c , b X = k · n E c , b # ( S n ) k � 1 is the averaged behavior of the parameter in consideration. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 8 / 17

  18. Enumeration in the Polymer-Zeta Model Model: Study r ( n + 1 ) = r ( n ) + � 0 � k � n − 2 r ( k ) r ( n − k − 1 ) × p n − k which – in analogy to Bernoulli model – is the expected number of structures of size n denoted E c , b # ( S n ) . d c for ( c , b ) ∈ { 1 , 2 } 2 (theoretical considerations imply b We considered p d = b = 1 , c = 1 . 5 , fitting to mRNA data yields c = 1 . 47 ). Reason: Our approach only allows integer values for c since p d is introduced into our equations by the following trick on generating functions: Consider the operator Θ = Θ ( z ) = z ∂ ∂ z . Then ( n + 1 ) c z n = bz n ; b For c = 1 , Θ ( n + 1 ) c z n = bz n . b for c = 2 , Θ 2 This way, we can derive appropriate differential equations for generating functions. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 8 / 17

  19. Average Number of Hairpins Theorem Under the assumption of the ( c , b ) -polymer-zeta model, c ∈ { 1 , 2 } , the average number of hairpins in a secondary structure of size n is asymptotically given by 1 , b = x 1 , b n ( 1 + O ( n − 1 2 )) X n 2 , b = x 2 , b n ( 1 + O (( log n ) − 1 )) X n where x c , b > 0 is a constant and for b ∈ { 1 , 2 } we have x 1 , 1 ≈ 0 . 1326 x 1 , 2 ≈ 0 . 1476 x 2 , 1 ≈ 0 . 1238 x 2 , 2 ≈ 0 . 1489 Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 9 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend