pairwise rna edit distance
play

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 - PowerPoint PPT Presentation

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated structures P 1 and P 2 scoring of alignment: different edit operations arc altering arc removing ..(((....)))..... 1)


  1. → Pairwise RNA Edit Distance • In the following: • Sequences S 1 and S 2 • associated structures P 1 and P 2 • scoring of alignment: different edit operations arc altering arc removing −−−.−−−.(((....)))...−−.. 1) ACGUUGACUGACAACAC −−−A−−−CGUUGACUGACAAC−−AC ..(((....)))..... 2) ACGAUCACGUACUAGCCUGAC ACGAUCACGU−−ACUAGC−−CUGAC ....(((.((....)).))). ....(((.((−−....))−−.))). base deletion arc mismatch base match arc match • Notation: • S k [ i ]: position i in sequence k (for k = 1 , 2). S.Will, 18.417, Fall 2011 • S k [ i ] is free if there is no arc incident in P k to i Jiang et al., 2002: • above scoring scheme • complexity of different problem classes • algorithms

  2. → Edit Distance – Scores • base scoring: base mismatch w m , base indel w d . • case 1: arc match and arc mismatch • arc match (cost 0): S 1 [ i 1 ] = S 2 [ j 1 ] and S 1 [ i 2 ] = S 2 [ j 2 ] i 1 i 2 • arc mismatch : S 1 [ i 1 ] � = S 2 [ j 1 ] or S 1 [ i 2 ] � = S 2 [ j 2 ] • cost for mismatch: • if both ends differ: w am j 1 j 2 • if only one differs: w am 2 • in the following: different ways of deleting arcs cost: cost for deleting arc + cost for base operations • case 2: arc breaking S.Will, 18.417, Fall 2011 i 1 i 2 • ( i 1 , i 2 ) in P 1 , but ( j 1 , j 2 ) is not in P 2 • cost: w b + possibly 2 · w m . j 1 j 2

  3. → Edit Distance – Scores (Cont.) • case 3: arc altering • case 4: arc removing i 1 i 2 i 1 i 2 j 1 j 2 • cost: w a + possibly w m . • cost: w r • remark: arc breaking/altering/removal can overlap A U G G G A S.Will, 18.417, Fall 2011 A G G G U U

  4. → Edit Distance – Scores Summary • operations on single bases: • base insertion/deletion ( w d ) • base mismatch ( w m ) • operations that act on both ends of an arc: 1. arc mismatch ( w am ) 2. arc breaking ( w b ) 3. arc altering ( w a ) 4. arc removing ( w r ) Example: S.Will, 18.417, Fall 2011 1234567890123456 (..)((.(.)))(..) CCGGAGGCCGCUCCCG CCG-ACCC-CGU-CC- (.).((....))....

  5. → Plan 1. Jiang algorithm solves the edit problem given the following restrictions: • non-crossing (aka nested aka pseudoknot-free) input structures 1 • pairwise alignment only • scoring restricted by w a = w r + w b . 2 2. show MAX-SNP-hardness without the restriction w a = w r + w b . 2 S.Will, 18.417, Fall 2011 1 actually, we will see that crossing in at most one structure is OK

  6. → Restriction w a = w r + w b 2 • Arc altering is at one end like arc removing and at the other end arc breaking • Restriction w a = w r + w b captures that 2 ⇒ left and right ends of arcs can be scored independently if they are broken, deleted or altered. ⇒ cost for arc end deletion w end and breaking w end instead d b of w r , w b , and w a : w b = 2 · w end b w r = 2 · w end d w a = w r + w b = w end + w end b d 2 S.Will, 18.417, Fall 2011 i ’ i k w d e n d w b e nd w end d A j j ’

  7. → Independent Arc Scoring • cost for arc end deletion w end and breaking w end Hence: Cost d b i 1 i 2 • arc breaking: w b = 2 · w end b j 1 j 2 i 1 i 2 • arc removing: w r = 2 · w end d i 1 i 2 • arc altering: w a = w end + w end b d j 1 j 2 S.Will, 18.417, Fall 2011 of breaking or removing one end of the arc is independent of whether the other end is broken/removed or not. Only the cost of matching one end of an arc is dependent on whether the other end is matched, too.

  8. → Example • cost for arc end deletion w end and breaking w end d b • arc breaking: w b = 2 · w end b • arc removing: w r = 2 · w end d • arc altering: w a = w end + w end b d 1234567890123456 (..)((.(.)))(..) S.Will, 18.417, Fall 2011 CCGGAGGCCGCUCCCG CCG-ACCC-CGU-CC- (.).((....))....

  9. → How to make a DP algorithm for alignment? dynamic programming ⇒ compute optimal alignment recursively from optimal alignments of “fragments” questions to answer: • what kind of “fragments” do we consider? ( ⇒ semantics of a matrix entry) • how to compute the solutions for all these fragments? ( ⇒ recursion equation) • complexity • details (evaluation order, implementation details,...) S.Will, 18.417, Fall 2011

  10. → Semantics of DP entry D ( i , i ′ , j , j ′ ) D ( i , i ′ , j , j ′ ) is the minimum cost of aligning the fragment [ i , i ′ ] of the first sequence to the fragment [ j , j ′ ] of the second sequence given that no arcs are matched that have one end inside these fragments and one end outside. Remarks • The additional restriction makes the alignment of the fragments independent of the alignment of the remaining parts. • We will see later, why it is not sufficient to look at (alignments of) prefixes, as done for plain sequence alignment. S.Will, 18.417, Fall 2011

  11. → Recursion for D ( i , i ′ , j , j ′ ) D ( i , i ′ , j , j ′ ) =  D ( i , i ′ − 1 , j , j ′ ) + w d + ψ 1 ( i ′ )( w end − w d )  d   D ( i , i ′ , j , j ′ − 1) + w d + ψ 2 ( j ′ )( w end  − w d )  d    D ( i , i ′ − 1 , j , j ′ − 1) + χ ( i ′ , j ′ ) w m + ( ψ 1 ( i ′ ) + ψ 2 ( j ′ )) w end   b min if ∃ ( a 1 , a 2 ) = (( i 1 , i ′ ) , ( j 1 , j ′ )) ∈ P 1 × P 2 for some i 1 , j 1    D ( i , i 1 − 1 , j , j 1 − 1) + D ( i 1 + 1 , i ′ − 1 , j 1 + 1 , j ′ − 1)      +( χ ( i 1 , j 1 ) + χ ( i ′ , j ′ )) w am   2 Notation S.Will, 18.417, Fall 2011 • ψ 1 ( i ) = 1 if i is paired in structure 1, 0 otherwise. ( ψ 2 ( i ) analogous) • χ ( i , j ) = 1 if S 1 [ i ] � = S 2 [ j ], 0 otherwise.

  12. → An optimized version: Jiang Algorithm • D ( i , i ′ , j , j ′ ) alignment of subsequences • in principle: all regions [ i .. i ′ ] and [ j .. j ′ ]. ⇒ O ( n 2 m 2 ) space • But: not all entries are considered a 1 i a l a 1 l +1 1 l l a a 2 +1 j 2 a 2 S.Will, 18.417, Fall 2011 • Hence: O ( nm )-matrices M a 1 a 2 for each pair of arcs a 1 , a 2 . Each matrix: O ( nm ) entries M a 1 a 2 ( i , j )

  13. → Jiang Recursion • reformulated recursion:  a 1 a 1 i i  M a 1  a 2 ( i − 1 , j ) + w d  i−1 i−1 a 1 a 1 l l  aligned aligned  to gap to gap  + ψ 1 ( i )( w end − w d ) broken bond  a l a l  j j d 2 2     a a  2 2    a 1   i  M a 1  a 2 ( i , j − 1) + w d  a l  aligned broken bond  1 to gap  + ψ 2 ( j )( w end − w d )  l a 2  j−1 d   j   a  2 M a 1 a 2 ( i , j ) = min a 1 i  M a 1  a 2 ( i − 1 , j − 1) + χ ( i , j ) w m  a 1 l i−1    +( ψ 1 ( i ) + ψ 2 ( j )) w end broken bond  a l  j−1 b 2    j  a  2    M a 1 a 2 ( i ′ − 1 , j ′ − 1) a 1  a’ 1     + M a ′ i’ a 1 l i S.Will, 18.417, Fall 2011  1  2 ( i − 1 , j − 1)  a ′  a 2 l j  j’  +( χ ( i ′ , j ′ ) + χ ( i , j )) w am   a’  2 2 a  2

  14. → Complexity • time complexity: O ( nm ) arc pairs × O ( nm ) alignment below arcs = O ( n 2 m 2 ) time • remaining question: space complexity: • each entry of some M a 1 a 2 only depends on • other entries of the same matrix M a 1 a 2 • and final entries of arc pairs of smaller arcs: a 1 a 1 l a 1 +1 l a 1 −1 r a 1 r l l r r a 2 a 2 +1 a 2 −1 a 2 a 2 ⇒ store final values in separate O ( nm ) matrix F (in recursion, replace lookup M a ′ 1 2 ( i − 1 , j − 1) by F ( a ′ 1 , a ′ 2 )) a ′ • ⇒ it suffices to keep only F and one M a 1 a 2 in memory simultaneously. S.Will, 18.417, Fall 2011 • compute all M a 1 a 2 ordered (increasing) according to size of a 1 and a 2

  15. → Complexity • Matrix F : O ( nm ) space • only one Matrix M a 1 a 2 at a time: O ( nm ) space argument: for computing one entry M a 1 a 2 ( i , j ), recurse only to F ( a ′ 1 , a ′ 2 ) for “smaller” a ′ 1 , a ′ 2 or entries of the same matrix M a 1 a 2 consequence: reuse space for M a 1 a 2 • TOTAL: O ( nm + nm ) = O ( nm ) space drawback: traceback requires recomputation but only O (min( n , m )) many matrices M a 1 a 2 need to be recomputed. S.Will, 18.417, Fall 2011

  16. → What about Pseudoknots? • Why doesn’t the algorithm work for pseudoknots? ⇒ last recursion case does not cover cases where matched arcs cross (compare Nussinov) S.Will, 18.417, Fall 2011 • only matching of crossing arcs is a problem ⇒ pseudoknots in only one of the structures are OK.

  17. → The alignment hierarchy • Alignment approaches have different limitations concerning • the two input structures • the common superstructure (e.g. for tree alignment ⇒ nested) • the set of edit operations • alignment hierarchy classifies alignment problems as input1 × input2 → superstructure with input1,input2,superstructure being one of • plain : only plain sequence (no basepairs at all) • nest : only nested structures (no pseudoknots) • cross : crossing structures (pseudoknots) • unlim : unlimited, also several base pairs per base possible. • Examples: S.Will, 18.417, Fall 2011 • cross × nest → unlim : Jiang algorithm • nest × nest → nest : tree alignment

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend