Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 - PowerPoint PPT Presentation

→ Pairwise RNA Edit Distance • In the following: • Sequences S 1 and S 2 • associated structures P 1 and P 2 • scoring of alignment: different edit operations arc altering arc removing −−−.−−−.(((....)))...−−.. 1) ACGUUGACUGACAACAC −−−A−−−CGUUGACUGACAAC−−AC ..(((....)))..... 2) ACGAUCACGUACUAGCCUGAC ACGAUCACGU−−ACUAGC−−CUGAC ....(((.((....)).))). ....(((.((−−....))−−.))). base deletion arc mismatch base match arc match • Notation: • S k [ i ]: position i in sequence k (for k = 1 , 2). S.Will, 18.417, Fall 2011 • S k [ i ] is free if there is no arc incident in P k to i Jiang et al., 2002: • above scoring scheme • complexity of different problem classes • algorithms

→ Edit Distance – Scores • base scoring: base mismatch w m , base indel w d . • case 1: arc match and arc mismatch • arc match (cost 0): S 1 [ i 1 ] = S 2 [ j 1 ] and S 1 [ i 2 ] = S 2 [ j 2 ] i 1 i 2 • arc mismatch : S 1 [ i 1 ] � = S 2 [ j 1 ] or S 1 [ i 2 ] � = S 2 [ j 2 ] • cost for mismatch: • if both ends differ: w am j 1 j 2 • if only one differs: w am 2 • in the following: different ways of deleting arcs cost: cost for deleting arc + cost for base operations • case 2: arc breaking S.Will, 18.417, Fall 2011 i 1 i 2 • ( i 1 , i 2 ) in P 1 , but ( j 1 , j 2 ) is not in P 2 • cost: w b + possibly 2 · w m . j 1 j 2

→ Edit Distance – Scores (Cont.) • case 3: arc altering • case 4: arc removing i 1 i 2 i 1 i 2 j 1 j 2 • cost: w a + possibly w m . • cost: w r • remark: arc breaking/altering/removal can overlap A U G G G A S.Will, 18.417, Fall 2011 A G G G U U

→ Edit Distance – Scores Summary • operations on single bases: • base insertion/deletion ( w d ) • base mismatch ( w m ) • operations that act on both ends of an arc: 1. arc mismatch ( w am ) 2. arc breaking ( w b ) 3. arc altering ( w a ) 4. arc removing ( w r ) Example: S.Will, 18.417, Fall 2011 1234567890123456 (..)((.(.)))(..) CCGGAGGCCGCUCCCG CCG-ACCC-CGU-CC- (.).((....))....

→ Plan 1. Jiang algorithm solves the edit problem given the following restrictions: • non-crossing (aka nested aka pseudoknot-free) input structures 1 • pairwise alignment only • scoring restricted by w a = w r + w b . 2 2. show MAX-SNP-hardness without the restriction w a = w r + w b . 2 S.Will, 18.417, Fall 2011 1 actually, we will see that crossing in at most one structure is OK

→ Restriction w a = w r + w b 2 • Arc altering is at one end like arc removing and at the other end arc breaking • Restriction w a = w r + w b captures that 2 ⇒ left and right ends of arcs can be scored independently if they are broken, deleted or altered. ⇒ cost for arc end deletion w end and breaking w end instead d b of w r , w b , and w a : w b = 2 · w end b w r = 2 · w end d w a = w r + w b = w end + w end b d 2 S.Will, 18.417, Fall 2011 i ’ i k w d e n d w b e nd w end d A j j ’

→ Independent Arc Scoring • cost for arc end deletion w end and breaking w end Hence: Cost d b i 1 i 2 • arc breaking: w b = 2 · w end b j 1 j 2 i 1 i 2 • arc removing: w r = 2 · w end d i 1 i 2 • arc altering: w a = w end + w end b d j 1 j 2 S.Will, 18.417, Fall 2011 of breaking or removing one end of the arc is independent of whether the other end is broken/removed or not. Only the cost of matching one end of an arc is dependent on whether the other end is matched, too.

→ Example • cost for arc end deletion w end and breaking w end d b • arc breaking: w b = 2 · w end b • arc removing: w r = 2 · w end d • arc altering: w a = w end + w end b d 1234567890123456 (..)((.(.)))(..) S.Will, 18.417, Fall 2011 CCGGAGGCCGCUCCCG CCG-ACCC-CGU-CC- (.).((....))....

→ How to make a DP algorithm for alignment? dynamic programming ⇒ compute optimal alignment recursively from optimal alignments of “fragments” questions to answer: • what kind of “fragments” do we consider? ( ⇒ semantics of a matrix entry) • how to compute the solutions for all these fragments? ( ⇒ recursion equation) • complexity • details (evaluation order, implementation details,...) S.Will, 18.417, Fall 2011

→ Semantics of DP entry D ( i , i ′ , j , j ′ ) D ( i , i ′ , j , j ′ ) is the minimum cost of aligning the fragment [ i , i ′ ] of the first sequence to the fragment [ j , j ′ ] of the second sequence given that no arcs are matched that have one end inside these fragments and one end outside. Remarks • The additional restriction makes the alignment of the fragments independent of the alignment of the remaining parts. • We will see later, why it is not sufficient to look at (alignments of) prefixes, as done for plain sequence alignment. S.Will, 18.417, Fall 2011

→ Recursion for D ( i , i ′ , j , j ′ ) D ( i , i ′ , j , j ′ ) =  D ( i , i ′ − 1 , j , j ′ ) + w d + ψ 1 ( i ′ )( w end − w d )  d   D ( i , i ′ , j , j ′ − 1) + w d + ψ 2 ( j ′ )( w end  − w d )  d    D ( i , i ′ − 1 , j , j ′ − 1) + χ ( i ′ , j ′ ) w m + ( ψ 1 ( i ′ ) + ψ 2 ( j ′ )) w end   b min if ∃ ( a 1 , a 2 ) = (( i 1 , i ′ ) , ( j 1 , j ′ )) ∈ P 1 × P 2 for some i 1 , j 1    D ( i , i 1 − 1 , j , j 1 − 1) + D ( i 1 + 1 , i ′ − 1 , j 1 + 1 , j ′ − 1)      +( χ ( i 1 , j 1 ) + χ ( i ′ , j ′ )) w am   2 Notation S.Will, 18.417, Fall 2011 • ψ 1 ( i ) = 1 if i is paired in structure 1, 0 otherwise. ( ψ 2 ( i ) analogous) • χ ( i , j ) = 1 if S 1 [ i ] � = S 2 [ j ], 0 otherwise.

→ An optimized version: Jiang Algorithm • D ( i , i ′ , j , j ′ ) alignment of subsequences • in principle: all regions [ i .. i ′ ] and [ j .. j ′ ]. ⇒ O ( n 2 m 2 ) space • But: not all entries are considered a 1 i a l a 1 l +1 1 l l a a 2 +1 j 2 a 2 S.Will, 18.417, Fall 2011 • Hence: O ( nm )-matrices M a 1 a 2 for each pair of arcs a 1 , a 2 . Each matrix: O ( nm ) entries M a 1 a 2 ( i , j )

→ Jiang Recursion • reformulated recursion:  a 1 a 1 i i  M a 1  a 2 ( i − 1 , j ) + w d  i−1 i−1 a 1 a 1 l l  aligned aligned  to gap to gap  + ψ 1 ( i )( w end − w d ) broken bond  a l a l  j j d 2 2     a a  2 2    a 1   i  M a 1  a 2 ( i , j − 1) + w d  a l  aligned broken bond  1 to gap  + ψ 2 ( j )( w end − w d )  l a 2  j−1 d   j   a  2 M a 1 a 2 ( i , j ) = min a 1 i  M a 1  a 2 ( i − 1 , j − 1) + χ ( i , j ) w m  a 1 l i−1    +( ψ 1 ( i ) + ψ 2 ( j )) w end broken bond  a l  j−1 b 2    j  a  2    M a 1 a 2 ( i ′ − 1 , j ′ − 1) a 1  a’ 1     + M a ′ i’ a 1 l i S.Will, 18.417, Fall 2011  1  2 ( i − 1 , j − 1)  a ′  a 2 l j  j’  +( χ ( i ′ , j ′ ) + χ ( i , j )) w am   a’  2 2 a  2

→ Complexity • time complexity: O ( nm ) arc pairs × O ( nm ) alignment below arcs = O ( n 2 m 2 ) time • remaining question: space complexity: • each entry of some M a 1 a 2 only depends on • other entries of the same matrix M a 1 a 2 • and final entries of arc pairs of smaller arcs: a 1 a 1 l a 1 +1 l a 1 −1 r a 1 r l l r r a 2 a 2 +1 a 2 −1 a 2 a 2 ⇒ store final values in separate O ( nm ) matrix F (in recursion, replace lookup M a ′ 1 2 ( i − 1 , j − 1) by F ( a ′ 1 , a ′ 2 )) a ′ • ⇒ it suffices to keep only F and one M a 1 a 2 in memory simultaneously. S.Will, 18.417, Fall 2011 • compute all M a 1 a 2 ordered (increasing) according to size of a 1 and a 2

→ Complexity • Matrix F : O ( nm ) space • only one Matrix M a 1 a 2 at a time: O ( nm ) space argument: for computing one entry M a 1 a 2 ( i , j ), recurse only to F ( a ′ 1 , a ′ 2 ) for “smaller” a ′ 1 , a ′ 2 or entries of the same matrix M a 1 a 2 consequence: reuse space for M a 1 a 2 • TOTAL: O ( nm + nm ) = O ( nm ) space drawback: traceback requires recomputation but only O (min( n , m )) many matrices M a 1 a 2 need to be recomputed. S.Will, 18.417, Fall 2011

→ What about Pseudoknots? • Why doesn’t the algorithm work for pseudoknots? ⇒ last recursion case does not cover cases where matched arcs cross (compare Nussinov) S.Will, 18.417, Fall 2011 • only matching of crossing arcs is a problem ⇒ pseudoknots in only one of the structures are OK.

→ The alignment hierarchy • Alignment approaches have different limitations concerning • the two input structures • the common superstructure (e.g. for tree alignment ⇒ nested) • the set of edit operations • alignment hierarchy classifies alignment problems as input1 × input2 → superstructure with input1,input2,superstructure being one of • plain : only plain sequence (no basepairs at all) • nest : only nested structures (no pseudoknots) • cross : crossing structures (pseudoknots) • unlim : unlimited, also several base pairs per base possible. • Examples: S.Will, 18.417, Fall 2011 • cross × nest → unlim : Jiang algorithm • nest × nest → nest : tree alignment

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 - PowerPoint PPT Presentation

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated structures P 1 and P 2 scoring of alignment: different edit operations arc altering arc removing ..(((....)))..... 1)

Minimum Cost Edit Distance Edit a source string into a target string Each edit has a cost

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

Click to edit Master title style Click to edit Master title style Click to edit Master title

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Edit distance Dynamic Programming Edit distance and its variants Misspellings make approximate

Minimum Edit Distance Definition of Minimum Edit Distance How

Why compute minimum edit distance? Minimum edit distance: worked example Sometimes we want to

Approximation of RNA Multiple Structural Alignment Marcin Kubica 1 , Romeo Rizzi 2 , Stphane

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on

A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google

Lecture 4: RNA folding Chapter 6 Problem 6.51 in Jones and Pevzner and the Turner model

Truly Subcubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of

Pattern matching and common structure inference in RNA (secondary) structures St ephane

Small RNAs and how to analyze them using sequencing Jakub

Sambuz

Useful Links

Newsletter

Mail Us

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 - PowerPoint PPT Presentation

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated structures P 1 and P 2 scoring of alignment: different edit operations arc altering arc removing ..(((....)))..... 1)

Minimum Cost Edit Distance Edit a source string into a target string Each edit has a cost

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

Click to edit Master title style Click to edit Master title style Click to edit Master title

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Edit distance Dynamic Programming Edit distance and its variants Misspellings make approximate

Minimum Edit Distance Definition of Minimum Edit Distance How

Why compute minimum edit distance? Minimum edit distance: worked example Sometimes we want to

Approximation of RNA Multiple Structural Alignment Marcin Kubica 1 , Romeo Rizzi 2 , Stphane

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on

A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google

Lecture 4: RNA folding Chapter 6 Problem 6.51 in Jones and Pevzner and the Turner model

Truly Subcubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of

Pattern matching and common structure inference in RNA (secondary) structures St ephane

Small RNAs and how to analyze them using sequencing Jakub

Sambuz

Useful Links

Newsletter

Mail Us

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics